feat: add Vision Transformer (ViT) implementation for image classification#13332
Closed
devvratpathak wants to merge 3 commits intoTheAlgorithms:masterfrom
Closed
feat: add Vision Transformer (ViT) implementation for image classification#13332devvratpathak wants to merge 3 commits intoTheAlgorithms:masterfrom
devvratpathak wants to merge 3 commits intoTheAlgorithms:masterfrom
Conversation
…features section - Add comprehensive table of contents for easy navigation - Include detailed installation steps with virtual environment setup - Add usage examples showing how to run and import algorithms - Create features section listing all algorithm categories - Add explicit license section with MIT License information - Expand contributing section with quick start guide - Add about section explaining repository purpose Fixes TheAlgorithms#13111
…features section - Add comprehensive table of contents for easy navigation - Include detailed installation steps with virtual environment setup - Add usage examples showing how to run and import algorithms - Create features section listing all algorithm categories - Add explicit license section with MIT License information - Expand contributing section with quick start guide - Add about section explaining repository purpose Fixes TheAlgorithms#13111
…ation - Implement complete ViT architecture with patch embedding - Add positional encoding with learnable CLS token - Include scaled dot-product attention mechanism - Implement transformer encoder blocks with layer normalization - Add feed-forward network with GELU activation - Include comprehensive docstrings and type hints - Add doctests for all functions - Provide example usage demonstrating the complete pipeline Fixes TheAlgorithms#13326
Closing this pull request as invalid@devvratpathak, this pull request is being closed as none of the checkboxes have been marked. It is important that you go through the checklist and mark the ones relevant to this pull request. Please read the Contributing guidelines. If you're facing any problem on how to mark a checkbox, please read the following instructions:
NOTE: Only |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a comprehensive Vision Transformer (ViT) implementation to the
computer_visionfolder for image classification tasks.Implementation Details
Implementation of the Vision Transformer architecture from "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (Dosovitskiy et al., 2020).
Core Components:
Code Quality:
__main__blockExample Usage: